Summary

  • The goal of this project is to use machine learning to predict Ag archiving capacity (or ability) in LECs from naive mice.
  • For a proof of principle analysis, Ag-tracking data for d14 cLECs was used to train a random forest classifier to predict Ag status
  • Using this model we defined a gene program that correlates with Ag status at various timepoints
  • Archiving “competent” cLECs can be predicted in the CHIKV LN scRNA-seq data
  • There is a reduction in archiving-competent cLECs in CHIKV-infected mice and a broad downregulation of the Ag-archiving gene program
  • The central goal for this project is to optimize the model (e.g. expand to other cell types) and use it to assess archiving capacity in samples that did not receive an Ag-tag (e.g. other published datasets). We can then identify perturbations/treatments etc that are predicted to impair archiving.


Classifying Ag-high

Ag-low and -high cells were identified by separately clustering each LEC subset for each sample into two groups based on Ag-score. For the 6wk-3wk sample, the 3wk Ag score is used. Ag-low/high classifications used for the analysis are shown below.


A random forest classifier was trained using data for d14 cLECs. The model was then used to predict Ag-high cells in the other Ag datasets.

The fraction of cells belonging to each predicted Ag group is shown on the left for cLECs from each sample. The fraction of true Ag-low, true Ag-high, and false-positive Ag-high cells (high-pred) is shown on the right.

  • The model is fairly accurate in predicting Ag-high cells in the training and test data (d14 cLECs), but does not perform as well when predicting Ag-low cells, this can be improved with more optimization
  • Since we want to identify gene signatures that are expressed in naive mice and continue to be expressed after Ag levels have fallen, we expect to observe an increasing fraction of false positive Ag-high cells for the later timepoints.




Ag modules

Expression of the top upregulated (top) and downregulated (bottom) gene modules that are most predictive of Ag signal are shown below.

  • There is a notable correlation between the expression of these genes and the Ag class
  • False positive Ag high cells (high-pred) show an intermediate level of expression that falls roughly between the true Ag-low and true Ag-high cells.
  • The false positive Ag high cells are potentially cells that are archiving-competent but have now lost/released most Ag at the later timepoints


UMAP projections show Ag-high module expression for each sample, the main cLEC cluster is circled.


UMAP projections show true Ag-low and true Ag-high cLECs.

  • Ag-high cells occupy the far right portion of the cLEC cluster which coincides with highest expression of the Ag-high gene module


UMAP projections show false positive Ag-high and true Ag-high cLECs.

  • False positive Ag-high cells (high-pred) show strong overlap with true Ag-high cells




Ag features

Mean expression in cLECs is shown on the left for genes from the Ag-high module for true Ag-low, true Ag-high, and false positive Ag-high (predicted, high-pred) cells. Expression is shown on the right for select top features.

Points show median expression, grey bars show interquartile range, dotted line shows the trend, and arrows indicate the gene is significantly up or down regulated when compared to Ag-low cells.


Expression of the Ag-low module is shown as described above.




Ag archiving in CHIKV

The fraction of cells predicted to be Ag-high (i.e. archiving competent) is shown below for each biological replicate. p-values < 0.05 are shown.

  • There is a reduction in archiving-competent cells in CHIKV-infected LN samples


Expression of the Ag-high (top) and Ag-low (bottom) gene modules is shown below for each predicted Ag class for the 24 hpi timepoint.

  • Cells predicted to be archiving-competent show upregulation of the Ag-high gene module
  • The Ag-low gene module shows similar expression between samples


Mean expression is shown for genes from the Ag-high module for mock- and CHIKV-infected mice from the 24 hpi timepoint.

  • CHIKV-infected mice broadly downregulate the Ag-high module




Session info

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
##  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
##  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
##  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
##  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
## [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
## 
## time zone: America/Denver
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] tools     grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] furrr_0.3.1           future_1.33.0         ranger_0.15.1        
##  [4] rsample_1.2.0         harmony_1.0.3         biomaRt_2.56.1       
##  [7] openxlsx_4.2.5.2      MetBrewer_0.2.0       rdrop2_0.8.2.1       
## [10] ggtext_0.1.2          ggtrace_0.2.0         qs_0.25.5            
## [13] vroom_1.6.3           M3Drop_1.26.0         numDeriv_2016.8-1.1  
## [16] djvdj_0.1.0           gtools_3.9.4          clustifyrdata_1.1.0  
## [19] here_1.0.1            presto_1.0.0          data.table_1.14.8    
## [22] Rcpp_1.0.11           devtools_2.4.5        usethis_2.2.2        
## [25] ComplexHeatmap_2.16.0 patchwork_1.1.3       scales_1.2.1         
## [28] boot_1.3-28.1         clustifyr_1.12.0      mixtools_2.0.0       
## [31] broom_1.0.5           colorblindr_0.1.0     colorspace_2.1-0     
## [34] xlsx_0.6.5            RColorBrewer_1.1-3    ggrepel_0.9.3        
## [37] cowplot_1.1.1         knitr_1.44            gprofiler2_0.2.2     
## [40] SeuratObject_4.1.4    Seurat_4.4.0          ggforce_0.4.1        
## [43] ggbeeswarm_0.7.2      lubridate_1.9.3       forcats_1.0.0        
## [46] stringr_1.5.0         dplyr_1.1.3           purrr_1.0.2          
## [49] readr_2.1.4           tidyr_1.3.0           tibble_3.2.1         
## [52] ggplot2_3.4.3         tidyverse_2.0.0      
## 
## loaded via a namespace (and not attached):
##   [1] IRanges_2.34.1              progress_1.2.2             
##   [3] urlchecker_1.0.1            nnet_7.3-19                
##   [5] goftest_1.2-3               Biostrings_2.68.1          
##   [7] rstan_2.26.23               vctrs_0.6.3                
##   [9] spatstat.random_3.1-6       RApiSerialize_0.1.2        
##  [11] digest_0.6.33               png_0.1-8                  
##  [13] shape_1.4.6                 deldir_1.0-9               
##  [15] parallelly_1.36.0           MASS_7.3-60                
##  [17] reshape2_1.4.4              httpuv_1.6.11              
##  [19] foreach_1.5.2               BiocGenerics_0.46.0        
##  [21] withr_2.5.1                 xfun_0.40                  
##  [23] ellipsis_0.3.2              survival_3.5-5             
##  [25] memoise_2.0.1               profvis_0.3.8              
##  [27] zoo_1.8-12                  GlobalOptions_0.1.2        
##  [29] pbapply_1.7-2               entropy_1.3.1              
##  [31] Formula_1.2-5               prettyunits_1.2.0          
##  [33] KEGGREST_1.40.1             promises_1.2.1             
##  [35] httr_1.4.7                  globals_0.16.2             
##  [37] fitdistrplus_1.1-11         ps_1.7.5                   
##  [39] stringfish_0.15.8           rstudioapi_0.15.0          
##  [41] miniUI_0.1.1.1              generics_0.1.3             
##  [43] base64enc_0.1-3             processx_3.8.2             
##  [45] curl_5.0.2                  S4Vectors_0.38.2           
##  [47] zlibbioc_1.46.0             polyclip_1.10-6            
##  [49] GenomeInfoDbData_1.2.10     xtable_1.8-4               
##  [51] doParallel_1.0.17           evaluate_0.22              
##  [53] S4Arrays_1.0.6              BiocFileCache_2.8.0        
##  [55] hms_1.1.3                   GenomicRanges_1.52.0       
##  [57] irlba_2.3.5.1               filelock_1.0.2             
##  [59] ROCR_1.0-11                 reticulate_1.32.0          
##  [61] spatstat.data_3.0-1         magrittr_2.0.3             
##  [63] lmtest_0.9-40               later_1.3.1                
##  [65] lattice_0.21-8              spatstat.geom_3.2-5        
##  [67] future.apply_1.11.0         scattermore_1.2            
##  [69] XML_3.99-0.14               matrixStats_1.0.0          
##  [71] RcppAnnoy_0.0.21            Hmisc_5.1-1                
##  [73] pillar_1.9.0                StanHeaders_2.26.28        
##  [75] nlme_3.1-162                iterators_1.0.14           
##  [77] caTools_1.18.2              compiler_4.3.1             
##  [79] stringi_1.7.12              tensor_1.5                 
##  [81] SummarizedExperiment_1.30.2 plyr_1.8.8                 
##  [83] crayon_1.5.2                abind_1.4-5                
##  [85] sp_2.0-0                    bit_4.0.5                  
##  [87] fastmatch_1.1-4             codetools_0.2-19           
##  [89] bslib_0.5.1                 QuickJSR_1.0.6             
##  [91] GetoptLong_1.0.5            plotly_4.10.2              
##  [93] mime_0.12                   splines_4.3.1              
##  [95] circlize_0.4.15             dbplyr_2.3.4               
##  [97] gridtext_0.1.5              blob_1.2.4                 
##  [99] utf8_1.2.3                  clue_0.3-65                
## [101] reldist_1.7-2               fs_1.6.3                   
## [103] listenv_0.9.0               checkmate_2.2.0            
## [105] pkgbuild_1.4.2              Matrix_1.6-1.1             
## [107] callr_3.7.3                 statmod_1.5.0              
## [109] tzdb_0.4.0                  tweenr_2.0.2               
## [111] pkgconfig_2.0.3             cachem_1.0.8               
## [113] RSQLite_2.3.1               viridisLite_0.4.2          
## [115] DBI_1.1.3                   fastmap_1.1.1              
## [117] rmarkdown_2.25              ica_1.0-3                  
## [119] sass_0.4.7                  RANN_2.6.1                 
## [121] rpart_4.1.19                farver_2.1.1               
## [123] mgcv_1.8-42                 yaml_2.3.7                 
## [125] MatrixGenerics_1.12.3       foreign_0.8-84             
## [127] cli_3.6.1                   stats4_4.3.1               
## [129] leiden_0.4.3                lifecycle_1.0.3            
## [131] uwot_0.1.16                 Biobase_2.60.0             
## [133] mvtnorm_1.2-3               kernlab_0.9-32             
## [135] sessioninfo_1.2.2           backports_1.4.1            
## [137] BiocParallel_1.34.2         timechange_0.2.0           
## [139] gtable_0.3.4                rjson_0.2.21               
## [141] ggridges_0.5.4              densEstBayes_1.0-2.2       
## [143] progressr_0.14.0            parallel_4.3.1             
## [145] jsonlite_1.8.7              bitops_1.0-7               
## [147] bit64_4.0.5                 Rtsne_0.16                 
## [149] spatstat.utils_3.0-3        zip_2.3.0                  
## [151] RcppParallel_5.1.7          bdsmatrix_1.3-6            
## [153] jquerylib_0.1.4             loo_2.6.0                  
## [155] segmented_1.6-4             lazyeval_0.2.2             
## [157] shiny_1.7.5                 htmltools_0.5.6            
## [159] rJava_1.0-6                 sctransform_0.4.0          
## [161] rappdirs_0.3.3              glue_1.6.2                 
## [163] XVector_0.40.0              RCurl_1.98-1.12            
## [165] rprojroot_2.0.3             gridExtra_2.3              
## [167] igraph_1.5.1                R6_2.5.1                   
## [169] SingleCellExperiment_1.22.0 gplots_3.1.3               
## [171] labeling_0.4.3              xlsxjars_0.6.1             
## [173] cluster_2.1.4               bbmle_1.0.25               
## [175] pkgload_1.3.3               GenomeInfoDb_1.36.3        
## [177] rstantools_2.3.1.1          DelayedArray_0.26.7        
## [179] tidyselect_1.2.0            vipor_0.4.5                
## [181] htmlTable_2.4.1             xml2_1.3.5                 
## [183] inline_0.3.19               AnnotationDbi_1.62.2       
## [185] munsell_0.5.0               KernSmooth_2.23-21         
## [187] htmlwidgets_1.6.2           fgsea_1.26.0               
## [189] rlang_1.1.1                 spatstat.sparse_3.0-2      
## [191] spatstat.explore_3.2-3      remotes_2.4.2.1            
## [193] fansi_1.0.4                 beeswarm_0.4.0